Random Walk Factoid Annotation for Collective Discourse
نویسندگان
چکیده
In this paper, we study the problem of automatically annotating the factoids present in collective discourse. Factoids are information units that are shared between instances of collective discourse and may have many different ways of being realized in words. Our approach divides this problem into two steps, using a graph-based approach for each step: (1) factoid discovery, finding groups of words that correspond to the same factoid, and (2) factoid assignment, using these groups of words to mark collective discourse units that contain the respective factoids. We study this on two novel data sets: the New Yorker caption contest data set, and the crossword clues data set.
منابع مشابه
Discourse Complements Lexical Semantics for Non-factoid Answer Reranking
We propose a robust answer reranking model for non-factoid questions that integrates lexical semantics with discourse information, driven by two representations of discourse: a shallow representation centered around discourse markers, and a deep one based on Rhetorical Structure Theory. We evaluate the proposed model on two corpora from different genres and domains: one from Yahoo! Answers and ...
متن کاملAgreement in Human Factoid Annotation for Summarization Evaluation
Factoid analysis was introduced by (van Halteren and Teufel, 2003) as an objective, yet semantics-oriented way of measuring overlap of information rather than surface strings in summaries. In this paper, we report on annotation experiments with two sets of summaries, and on a factoid-pairing program which finds correlations between factoids semi-automatically.
متن کاملEvaluating Information Content by Factoid Analysis: Human annotation and stability
We present a new approach to intrinsic summary evaluation, based on initial experiments in van Halteren and Teufel (2003), which combines two novel aspects: comparison of information content (rather than string similarity) in gold standard and system summary, measured in shared atomic information units which we call factoids, and comparison to more than one gold standard summary (in our data: 2...
متن کاملRefining Image Annotation by Integrating PLSA with Random Walk Model
In this paper, we present a new method for refining image annotation by integrating probabilistic latent semantic analysis (PLSA) with random walk (RW) model. First, we construct a PLSA model with asymmetric modalities to estimate the posterior probabilities of each annotating keywords for an image, and then a label similarity graph is constructed by a weighted linear combination of label simil...
متن کاملCollective Media Annotation using Random Field Models
We present methods for semantic annotation of multimedia data. The goal is to detect semantic attributes (also referred to as concepts) in clips of video via analysis of a single keyframe or set of frames. The proposed methods integrate high performance discriminative single concept detectors in a random field model for collective multiple concept detection. Furthermore, we describe a generic f...
متن کامل